53 research outputs found

    Pattern matching and pattern discovery algorithms for protein topologies

    Get PDF
    We describe algorithms for pattern matching and pattern learning in TOPS diagrams (formal descriptions of protein topologies). These problems can be reduced to checking for subgraph isomorphism and finding maximal common subgraphs in a restricted class of ordered graphs. We have developed a subgraph isomorphism algorithm for ordered graphs, which performs well on the given set of data. The maximal common subgraph problem then is solved by repeated subgraph extension and checking for isomorphisms. Despite the apparent inefficiency such approach gives an algorithm with time complexity proportional to the number of graphs in the input set and is still practical on the given set of data. As a result we obtain fast methods which can be used for building a database of protein topological motifs, and for the comparison of a given protein of known secondary structure against a motif database

    Consensus clustering and functional interpretation of gene-expression data

    Get PDF
    Microarray analysis using clustering algorithms can suffer from lack of inter-method consistency in assigning related gene-expression profiles to clusters. Obtaining a consensus set of clusters from a number of clustering methods should improve confidence in gene-expression analysis. Here we introduce consensus clustering, which provides such an advantage. When coupled with a statistically based gene functional analysis, our method allowed the identification of novel genes regulated by NFĪŗB and the unfolded protein response in certain B-cell lymphomas

    Stability-activity tradeoffs constrain the adaptive evolution of RubisCO

    Get PDF
    A well-known case of evolutionary adaptation is that of ribulose-1,5-bisphosphate carboxylase (RubisCO), the enzyme responsible for fixation of CO2 during photosynthesis. Although the majority of plants use the ancestral C3 photosynthetic pathway, many flowering plants have evolved a derived pathway named C4 photosynthesis. The latter concentrates CO2, and C4 RubisCOs consequently have lower specificity for, and faster turnover of, CO2. The C4 forms result from convergent evolution in multiple clades, with substitutions at a small number of sites under positive selection. To understand the physical constraints on these evolutionary changes, we reconstructed in silico ancestral sequences and 3D structures of RubisCO from a large group of related C3 and C4 species. We were able to precisely track their past evolutionary trajectories, identify mutations on each branch of the phylogeny, and evaluate their stability effect. We show that RubisCO evolution has been constrained by stability-activity tradeoffs similar in character to those previously identified in laboratory-based experiments. The C4 properties require a subset of several ancestral destabilizing mutations, which from their location in the structure are inferred to mainly be involved in enhancing conformational flexibility of the open-closed transition in the catalytic cycle. These mutations are near, but not in, the active site or at intersubunit interfaces. The C3 to C4 transition is preceded by a sustained period in which stability of the enzyme is increased, creating the capacity to accept the functionally necessary destabilizing mutations, and is immediately followed by compensatory mutations that restore global stability

    ISPIDER Central: an integrated database web-server for proteomics

    Get PDF
    Despite the growing volumes of proteomic data, integration of the underlying results remains problematic owing to differences in formats, data captured, protein accessions and services available from the individual repositories. To address this, we present the ISPIDER Central Proteomic Database search (http://www.ispider.manchester.ac.uk/cgi-bin/ProteomicSearch.pl), an integration service offering novel search capabilities over leading, mature, proteomic repositories including PRoteomics IDEntifications database (PRIDE), PepSeeker, PeptideAtlas and the Global Proteome Machine. It enables users to search for proteins and peptides that have been characterised in mass spectrometry-based proteomics experiments from different groups, stored in different databases, and view the collated results with specialist viewers/clients. In order to overcome limitations imposed by the great variability in protein accessions used by individual laboratories, the European Bioinformatics Institute's Protein Identifier Cross-Reference (PICR) service is used to resolve accessions from different sequence repositories. Custom-built clients allow users to view peptide/protein identifications in different contexts from multiple experiments and repositories, as well as integration with the Dasty2 client supporting any annotations available from Distributed Annotation System servers. Further information on the protein hits may also be added via external web services able to take a protein as input. This web server offers the first truly integrated access to proteomics repositories and provides a unique service to biologists interested in mass spectrometry-based proteomics

    Computation of protein geometry and its applications: Packing and function prediction

    Full text link
    This chapter discusses geometric models of biomolecules and geometric constructs, including the union of ball model, the weigthed Voronoi diagram, the weighted Delaunay triangulation, and the alpha shapes. These geometric constructs enable fast and analytical computaton of shapes of biomoleculres (including features such as voids and pockets) and metric properties (such as area and volume). The algorithms of Delaunay triangulation, computation of voids and pockets, as well volume/area computation are also described. In addition, applications in packing analysis of protein structures and protein function prediction are also discussed.Comment: 32 pages, 9 figure

    Pattern matching and pattern discovery algorithms for protein topologies

    Get PDF
    We describe algorithms for pattern matching and pattern learning in TOPS diagrams (formal descriptions of protein topologies). These problems can be reduced to checking for subgraph isomorphism and finding maximal common subgraphs in a restricted class of ordered graphs. We have developed a subgraph isomorphism algorithm for ordered graphs, which performs well on the given set of data. The maximal common subgraph problem then is solved by repeated subgraph extension and checking for isomorphisms. Despite the apparent inefficiency such approach gives an algorithm with time complexity proportional to the number of graphs in the input set and is still practical on the given set of data. As a result we obtain fast methods which can be used for building a database of protein topological motifs, and for the comparison of a given protein of known secondary structure against a motif database

    Sequences and topology. Genes and structures in context

    No full text

    PFDB: a generic protein family database integrating the CATH domain structure database with sequence based protein family resources

    No full text
    Motivation: The PFDB (Protein Family Database) is a new database designed to integrate protein family-related data with relevant functional and genomic data. It currently manages biological data for three projectsā€”the CATH protein domain database (Orengo et al., 1997; Pearl et al., 2001), the VIDA virus domains database (AlbĆ  et al., 2001) and the Gene3D database (Buchan et al., 2001). The PFDB has been designed to accommodate protein families identified by a variety of sequence based or structure based protocols and provides a generic resource for biological research by enabling mapping between different protein families and diverse biochemical and genetic data, including complete genomes. Results: A characteristic feature of the PFDB is that it has a number of meta-level entities (for example aggregation, collection and inclusion) represented as base tables in the final design. The explicit representation of relationships at the meta-level has a number of advantages, including flexibilityā€”both in terms of the range of queries that can be formulated and the ability to integrate new biological entities within the existing design. A potential drawback with this approachā€”poor performance caused by the number of joins across meta-level tablesā€”is avoided by implementing the PFDB with materialized views using the mature relational database technology of Oracle 8i. The resultant database is both fast and flexible. This paper presents the principles on which the database has been designed and implemented, and describes the current status of the database and query facilities supported
    • ā€¦
    corecore